Linear Regression Stability Test Formula

The stability test uses linear regression to detect temporal trends in precipitation data:

Linear Model: $$Y_t = \beta_0 + \beta_1 \cdot t + \epsilon_t$$

Where:

  • $Y_t$ = precipitation value at time $t$
  • $\beta_0$ = intercept (initial value)
  • $\beta_1$ = slope (rate of change over time)
  • $t$ = time index (0, 1, 2, ..., n-1)
  • $\epsilon_t$ = random error term

Hypothesis Testing:

  • $H_0$: $\beta_1 = 0$ (no trend, data is stable)
  • $H_1$: $\beta_1 \neq 0$ (significant trend exists, data is not stable)

Test Statistic: $$t = \frac{\beta_1}{SE(\beta_1)}$$

P-value Calculation: $$p\text{-value} = 2 \times P(T_{n-2} \geq |t|)$$

Where:

  • $T_{n-2}$ follows a t-distribution with $(n-2)$ degrees of freedom
  • $n$ = number of data points
  • The factor of 2 accounts for the two-tailed test

Decision Rule:

  • If $p\text{-value} < \alpha$ (0.05): Reject $H_0$ → Station FAILS (has significant trend)
  • If $p\text{-value} \geq \alpha$ (0.05): Accept $H_0$ → Station PASSES (no significant trend)

Additional Metrics:

  • $R^2$: Coefficient of determination (proportion of variance explained by time)
  • Standard Error: $SE(\beta_1) = \sqrt{\frac{MSE}{\sum(t_i - \bar{t})^2}}$

Linear Regression Stability Test - Step by Step Process

This script performs a temporal stability test on precipitation data using linear regression analysis. Here's how it works:

Step 1: Data Preparation

  • Load precipitation data from Excel file containing multiple station columns
  • Extract numeric columns (station data) excluding the Date column
  • Create a time index array (0, 1, 2, ..., n-1) representing temporal sequence

Step 2: Linear Regression Analysis

For each station (numeric column):

  1. Fit Linear Model: Apply scipy.stats.linregress() to model precipitation vs. time
  2. Extract Parameters:
    • Slope (β₁): Rate of change in precipitation over time
    • Intercept (β₀): Initial precipitation value
    • R-squared: Proportion of variance explained by the temporal trend
    • P-value: Statistical significance of the trend
    • Standard Error: Uncertainty in the slope estimate

Step 3: Hypothesis Testing

  • Significance Level: α = 0.05 (95% confidence)
  • Decision Rule:
    • If p-value < 0.05 → PASSED (significant trend detected, data is NOT stable)
    • If p-value ≥ 0.05 → FAILED (no significant trend, data is stable)

Step 4: Visualization

Four subplots are generated:

  1. Slope Plot: Shows trend direction and magnitude (green = passed, red = failed)
  2. P-value Plot: Displays statistical significance with α threshold line
  3. R-squared Plot: Indicates strength of temporal relationship
  4. Summary Statistics: Overall test results and pass/fail counts

Step 5: Results Export

  • Detailed results are saved to Excel file
  • Output includes all regression parameters and test outcomes for each station

Note: In this context, "passing" means detecting a significant trend, which may indicate climate change or data instability.

Python
import pandas as pd
import numpy as np
from scipy import stats
Python
file_source = r"\Data\grids_precipitation_data.xlsx"
Python
# Load the data
df = pd.read_excel(file_source)

# Get numeric columns (excluding Date column)
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
Python
import matplotlib.pyplot as plt

# Define significance level
alpha = 0.05

# Perform linear regression for each station
regression_results = {}
time_index = np.arange(len(df))

for col in numeric_cols:
    # Perform linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(time_index, df[col])
    
    regression_results[col] = {
        'slope': slope,
        'intercept': intercept,
        'r_squared': r_value**2,
        'p_value': p_value,
        'std_err': std_err
    }

# Convert regression_results dictionary to DataFrame
results_df = pd.DataFrame.from_dict(regression_results, orient='index')

# Add a column to indicate if the trend is statistically significant
results_df['significant'] = results_df['p_value'] < alpha
results_df['passed_test'] = results_df['significant']

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Plot 1: Slope values with significance
ax1 = axes[0, 0]
colors = ['green' if sig else 'red' for sig in results_df['passed_test']]
ax1.bar(results_df.index.astype(str), results_df['slope'], color=colors, alpha=0.7)
ax1.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax1.set_xlabel('Station')
ax1.set_ylabel('Slope (trend)')
ax1.set_title('Trend Slope by Station\n(Green=Passed, Red=Failed)')
ax1.tick_params(axis='x', rotation=90, labelsize=6)

# Plot 2: P-values
ax2 = axes[0, 1]
ax2.bar(results_df.index.astype(str), results_df['p_value'], color=colors, alpha=0.7)
ax2.axhline(y=alpha, color='blue', linestyle='--', linewidth=2, label=f'α={alpha}')
ax2.set_xlabel('Station')
ax2.set_ylabel('P-value')
ax2.set_title('P-values by Station')
ax2.legend()
ax2.tick_params(axis='x', rotation=90, labelsize=6)

# Plot 3: R-squared values
ax3 = axes[1, 0]
ax3.bar(results_df.index.astype(str), results_df['r_squared'], color=colors, alpha=0.7)
ax3.set_xlabel('Station')
ax3.set_ylabel('R-squared')
ax3.set_title('R-squared by Station')
ax3.tick_params(axis='x', rotation=90, labelsize=6)

# Plot 4: Summary statistics
ax4 = axes[1, 1]
ax4.axis('off')
passed = results_df['passed_test'].sum()
failed = len(results_df) - passed
summary_text = f"""
Test Summary (α = {alpha}):
━━━━━━━━━━━━━━━━━━━━━━
Stations Passed: {passed} ({passed/len(results_df)*100:.1f}%)
Stations Failed: {failed} ({failed/len(results_df)*100:.1f}%)
Total Stations: {len(results_df)}

Criteria: p-value < {alpha}
"""
ax4.text(0.1, 0.5, summary_text, fontsize=14, family='monospace',
         verticalalignment='center')

plt.tight_layout()
plt.show()

# Print detailed results
print("\n" + "="*70)
print("STATION TEST RESULTS")
print("="*70)
print(results_df[['slope', 'p_value', 'r_squared', 'passed_test']].to_string())
print("\n" + "="*70)
print(f"Stations that PASSED (significant trend): {passed}/{len(results_df)}")
print(f"Stations that FAILED (no significant trend): {failed}/{len(results_df)}")
print("="*70)
Python
# Save the results to Excel
output_file = r"\Data\linear_regression_results.xlsx"
results_df.to_excel(output_file, index=True)
print(f"Results saved to: {output_file}")